Dynamic Load Sharing With Unknown Memory Demand of Jobs in Clustered Compute Farms

نویسندگان

  • Songqing Chen
  • Xiaodong Zhang
  • Li Xiao
چکیده

A compute farm is a pool of clustered workstations to provide high performance computing services for CPU-intensive, memory-intensive, and I/O active jobs. Existing load sharing schemes with memory considerations assume jobs’ memory demand sizes are known in advance or predictable based on users’ hints. This assumption can greatly simplify the designs and implementations of memory-centric schemes, but is not desirable in practice. Load sharing with unknown memory demand of jobs requires an insightful understanding of the effects of dynamic job interactions to memory systems and an effective usage of dynamic system information. In order to address this issue, we present three new results and contributions in this study. (1) Conducting Linux kernel instrumentations, we have collected different types of workload execution traces to quantitatively characterize job interactions, and modeled page fault behavior as a function of the overloaded memory sizes and the amount of jobs’ I/O activities. (2) Based on experimental results and collected dynamic system information, we have built a simulation model which accurately emulates the memory system operations and job migrations with virtual memory considerations. (3) We have proposed a memory-centric load sharing scheme and its variations to effectively process dynamic memory allocation demand of jobs, aiming at minimizing execution time of each individual job and maximizing distributed system throughput by dynamically migrating and remotely submitting jobs from heavily loaded machines to lightly loaded ones to eliminate or reduce page faults and to reduce the waiting time for CPU services. Conducting trace-driven simulations, we have examined these load sharing policies to show their effectiveness. Our study strongly suggests that load index be designed to comprehensively consider utilizing all the available resources including CPUs, global memory and network resources. This work is supported in part by the National Science Foundation under grants CCR-9400719, and CCR-9812187, and EIA9977030, by the Air Force Office of Scientific Research under grant AFOSR-95-1-0215, and by Sun Microsystems under grant EDUE-NAFO-980405.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Load Sharing with Unknown Memory Demands in Clusters

A compute farm is a pool of clustered workstations to provide high per$ormance computing services for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing load sharing schemes with memory considerations assume jobs’ memory demand sizes are known in advance or predictable based on users’ hints. This assumption can greatly simplify the designs and implementations of load...

متن کامل

Adaptive and Virtual Reconfigurations for Effective Dynamic Job Scheduling in Cluster Systems

In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time [3]. In such a system, a small number of running jobs with unexpectedly large memory allocation requirements may significantly increase the queuing delay times of the rest of jobs with normal memory req...

متن کامل

Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands

ÐThe cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed ...

متن کامل

Improving Distributed Workload Performance by Sharing both CPU and Memory Resources

We develop and examine job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems. When a node is identified for lacking sufficient memory space to serve jobs, one or more jobs of the node will be migrated to remote nodes with low memory allocations. If the memory space is sufficiently large, the jobs will be scheduled by a CPU-...

متن کامل

Local Cluster First Load Sharing Policy for Heterogeneous Clusters

This paper studies the load sharing problem among heterogeneous cluster systems. The heterogeneous clusters we consider are time-sharing, and the computers in these clusters have different CPU powers and memory capacities. Load sharing means even workloads among all coordinated computers in the system. As some nodes suffer from high loading, it is necessary to migrate some jobs to the nodes wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008